Description

This notebook contains analysis of mRNA, microRNA and DNAm data. mRNA data is previously unpublished, although MiR and DNAm data is: MiR: https://pubmed.ncbi.nlm.nih.gov/30783190/ DNAm: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7554960/ This is the first notebook that generates results required for further analysis so it should be ran first. The information about libraries and versions used while writing this notebook are provided in the bottom via sessionInfo() function.
The aim in this project is to make an multi-omics integrative analysis of HNSC tumor samples. High-throughput sequencing of mRNA and microRNA was preformed. Exploratory data analysis and differential expression analysis was preformed according to the DESeq2 vignette. DNAm data was generated Illumina Infinium Methylation EPIC BeadChip that contains ~900k CpG’s.

Libraries

library(rmarkdown)
library(BiocManager)
library(GenomicFeatures)
library(stringr)
library(tximport)
library(DESeq2)
library(pheatmap)
library(RColorBrewer)
library(ggplot2)
library(ggrepel)
library(gprofiler2)
library(RnBeads)
library(RnBeads.hg19)
library(readxl)
library(plyranges)

source(file = file.path(getwd(), "scripts", "idMap.R"))
source(file = file.path(getwd(), "scripts", "triangleHeatMap.R"))

mRNA

In this chapter Data analysis of mRNA data is preformed.

Raw data

Raw data was provided by Sabol and dowloaded from cloud:
https://irbhr-my.sharepoint.com/:f:/g/personal/ivan_sabol_irb_hr/Eq5MRE0xi8pNuNOuwWQ-vb0BT6MaSxwmp97PKUF97YRoxg?e=gA69Fb
Data demultiplexing was preformed with script:

Building index with: For RNA seq mapping index is built from the gencode transcirpt data from hg19 assembly: https://www.gencodegenes.org/human/release_19.html The bash call for salmon to build the index is this:

salmon index -t gencode.v19.pc_transcripts.fa.gz -i homosapiens_hg19_gencode_transcripts

Mapping with Salmon. Mapping was done on a cluster with salmon tool. the Folder has to contain folder with data (mRNA_data) and the index file that was built. Data is saved in individual folders that are named OV401, OV402, … OV413.

for fn in mRNA_data/OV{401..413};
do
samp=`basename ${fn}`
echo "Processing sample ${samp}"
salmon quant -i homosapiens_hg19_gencode_transcripts -l A -p 6 --validateMappings --gcBias --numGibbsSamples 20 -o quants/${samp}_quant -1 ${fn}/${samp}_1.fastq.gz -2 ${fn}/${samp}_2.fastq.gz
done 

Loading data

#reading coldata 
dir <- file.path(getwd(), "inputs")
mRNA_coldata <- 
  read.csv(file.path(dir, "mRNA_sample_table.csv"),
           row.names=1, sep=",", 
           stringsAsFactors=FALSE)

#adding file path of quants for tximport
mRNA_coldata$files <- 
  file.path(dir,
            "mRNA_quants_hg19_new",
            paste0(mRNA_coldata$names,"_quant"),
            "quant.sf")

#Check if everything is okay
file.exists(mRNA_coldata$files)
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Tximport
This function maps transcript id’s to gene id’s. Firstly we make an txdb object from the gff3 file. Then build an tx2gene object which contains ensembl gene ids and txids from salmon. After tx2gene object is built we input it in the tximport function. Next we can build an DESeq object wiht DESeqDataSetFromTximport function. gff3 file - https://www.gencodegenes.org/human/release_19.html

#fetching database for mapping gene id to transcript name
txdb <- makeTxDbFromGFF(file.path(dir, "gencode.v19.annotation.gff3.gz"))

k <- keys(txdb, keytype = "TXNAME")
tx2gene <- AnnotationDbi::select(txdb,k,"GENEID", "TXNAME")

saveRDS(tx2gene, file = file.path(getwd(), "outputs/RDS/tx2gene.rds"))

Separating these two chunks to make the code faster

tx2gene <- readRDS(file.path(getwd(), "outputs/RDS/tx2gene.rds"))

#parsing txnames
tx2gene$TXNAME <- str_split(tx2gene$TXNAME,pattern = "\\.",
                            simplify =T)[,1]


# importing quantification data
txi <- tximport(mRNA_coldata$files,
                type="salmon", 
                tx2gene=tx2gene,
                ignoreTxVersion=T,
                ignoreAfterBar=T)

Building dds object
dds (DeseqDataSet) objects are custom DESeq class of SummarizedExperiments class from Bioconductor. Also making a minimum filter to filter out

# building mRNA dds object
mRNA_dds <- 
  DESeqDataSetFromTximport(txi, mRNA_coldata, design= ~ Hist + Tissue + HPV)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
# trimming rownames to remove the suffix
rownames(mRNA_dds) <- str_split(rownames(mRNA_dds),"\\.", simplify = T)[,1]

# adding colnames
colnames(mRNA_dds) <- mRNA_coldata$Sample.ID

# filtering rows low counts
keep <- rowSums(counts(mRNA_dds)) > 10
mRNA_dds <- mRNA_dds[keep,]

dim(mRNA_dds)
## [1] 18104    13
head(counts(mRNA_dds))
##                 KBD2 KBD3 KBD10 KBD11 KBD13 KBD15 KBD16 KBD17 KBD18 KBD21 RNA3T
## ENSG00000000003  150  642   325   160   310   694   397   398   857   404   167
## ENSG00000000005    0    1     0     0     3     0     1     0     0     1     7
## ENSG00000000419  118  611   200   486   392   589   478   392   258   595   275
## ENSG00000000457  106  241   192   169   218   278   199   301   289   315   311
## ENSG00000000460  131  305   133   132   107   490   248   344   177   372   131
## ENSG00000000938  120  226   283   228   762   154   243   164   331   374   541
##                 RNA10T RNA16T
## ENSG00000000003    206    131
## ENSG00000000005      0     23
## ENSG00000000419    337    102
## ENSG00000000457    491    107
## ENSG00000000460    213     60
## ENSG00000000938   1002     34

Exploratory analysis

To do an inter sample exploratory analysis, a statistical method for stabilizing variance is preformed.

mRNA_rld <- DESeq2::rlog(mRNA_dds)

PCA plot
PCA plot converts the correlations (or lack there of) among all of the samples into a 2-D graph. Highly correlated samples cluster together. It is a version of a dimension reduction method. Also, the axis are ordered by importanc or how much of the variance in the data they represent.

# selecting data for plotting pca
mRNA_pcadata <- 
  DESeq2::plotPCA(mRNA_rld, 
                  intgroup = c("Hist", "HPV", "Tissue"),
                  returnData = TRUE)
# selecting variance
percentVar <- round(100 * attr(mRNA_pcadata, "percentVar"))

# plotting pca data
ggplot(mRNA_pcadata, aes(x = PC1, y = PC2, color = HPV, shape = Hist)) +
  geom_point(size =3) +
  xlab(paste0("PC1: ", percentVar[1], "% variance")) +
  ylab(paste0("PC2: ", percentVar[2], "% variance")) +
   theme(panel.grid = element_blank(), 
         panel.background = element_rect(fill = "white"), 
  panel.border = element_rect(fill = "transparent"))+
  coord_fixed() +
  ggtitle("PCA with rlog data")+
  scale_color_manual(values = c("firebrick1","forestgreen","blue1"))+
  geom_text_repel(aes(label = colData(mRNA_rld)$Sample.ID), size =3)

Sample distance To asses overall similarity between samples we calculate sample distance using the R function dist to calculate the Euclidean distance.

Triangle sample distance heatmap

mRNA_annot_row <- as.data.frame(colData(mRNA_rld)[,c("Hist","Tissue", "HPV")])

# Calling the local funciton
triangleHeatMap(assay(mRNA_rld),
                annot_row = mRNA_annot_row)
## Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
## Also defined by 'spam'
## Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
## Also defined by 'spam'

Differential expression analysis

The call to DESeq function preforms the estimation of size factors, estimation of dispersion values and fits a generalized linear model.

mRNA_dds <- DESeq(mRNA_dds, betaPrior = T)

Differential expression on cancer control group

mRNA_res <- results(mRNA_dds, contrast = c("Hist","cancer","normal"))

mRNA_res
## log2 fold change (MAP): Hist cancer vs normal 
## Wald test p-value: Hist cancer vs normal 
## DataFrame with 18104 rows and 6 columns
##                  baseMean log2FoldChange     lfcSE       stat     pvalue
##                 <numeric>      <numeric> <numeric>  <numeric>  <numeric>
## ENSG00000000003 364.56985       0.585192  0.618222   0.946572 0.34385672
## ENSG00000000005   5.82988      -3.843846  1.130315  -3.400686 0.00067217
## ENSG00000000419 353.89118       0.523721  0.425025   1.232213 0.21786970
## ENSG00000000457 241.36876      -0.426076  0.479682  -0.888247 0.37440769
## ENSG00000000460 205.29988       0.327594  0.476669   0.687256 0.49192119
## ...                   ...            ...       ...        ...        ...
## ENSG00000273291   1.19174      0.0464801  1.194795  0.0389021   0.968968
## ENSG00000273294   6.89812      1.9531712  1.194128  1.6356458   0.101914
## ENSG00000273331   4.19184      1.5054733  1.192183  1.2627868   0.206666
## ENSG00000273398  21.52493     -0.9066724  1.032461 -0.8781663   0.379853
## ENSG00000273439  13.65698     -0.2950699  0.654476 -0.4508493   0.652098
##                       padj
##                  <numeric>
## ENSG00000000003 0.55418382
## ENSG00000000005 0.00930593
## ENSG00000000419 0.42149623
## ENSG00000000457 0.58019653
## ENSG00000000460 0.67826924
## ...                    ...
## ENSG00000273291         NA
## ENSG00000273294   0.267233
## ENSG00000273331   0.408695
## ENSG00000273398   0.584976
## ENSG00000273439   0.795028

Some summary statistic

summary(mRNA_res, alpha=0.05)
## 
## out of 18104 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 1081, 6%
## LFC < 0 (down)     : 1448, 8%
## outliers [1]       : 1182, 6.5%
## low counts [2]     : 350, 1.9%
## (mean count < 1)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
mRNA_DE <- filter(as.data.frame(mRNA_res), padj < 0.05)

summary(mRNA_DE)
##     baseMean         log2FoldChange        lfcSE             stat         
##  Min.   :     2.14   Min.   :-6.8618   Min.   :0.2436   Min.   :-10.0581  
##  1st Qu.:    67.48   1st Qu.:-2.6141   1st Qu.:0.5311   1st Qu.: -3.5335  
##  Median :   256.60   Median :-1.5892   Median :0.6630   Median : -2.8231  
##  Mean   :  1325.16   Mean   :-0.5309   Mean   :0.6841   Mean   : -0.5478  
##  3rd Qu.:   803.43   3rd Qu.: 1.8801   3rd Qu.:0.8192   3rd Qu.:  3.1543  
##  Max.   :181417.54   Max.   : 7.1293   Max.   :1.1917   Max.   :  8.3316  
##      pvalue               padj         
##  Min.   :0.000e+00   Min.   :0.000000  
##  1st Qu.:7.652e-05   1st Qu.:0.002003  
##  Median :8.098e-04   Median :0.010608  
##  Mean   :1.799e-03   Mean   :0.015340  
##  3rd Qu.:3.017e-03   3rd Qu.:0.026357  
##  Max.   :7.611e-03   Max.   :0.049870
hist(mRNA_DE$log2FoldChange, border="white", col="darkgreen")

hist(log(mRNA_DE$baseMean), border="white", col="darkblue")

GSEA on DE mRNA In the next chunk gene set enrichment analysis is preformed on DE mRNA data

# Using only part of sources
GSEA_SOURCES <- c("GO:BP","CORUM","KEGG","REAC","WP","MIRNA", "TF")


# Call on the gost 
gost_mRNA <- 
  gost(rownames(mRNA_DE),
       organism= "hsapiens",
       exclude_iea=TRUE,
       domain_scope="annotated",
       ordered_query=F,
       sources=GSEA_SOURCES)

# Change query name
gost_mRNA$result$query <- "DE genes vs genome"

#Plot and table of significant results
gostplot(gost_mRNA ,capped = F, interactive = T)

DE genes on background. Results are similar, which leads to conclude that all sequenced data is covering most of the genes.

# Call on the gost 
gost_mRNA <- 
  gost(rownames(mRNA_DE),
       organism= "hsapiens",
       exclude_iea=TRUE,
       domain_scope="custom",
       custom_bg=rownames(mRNA_res),
       ordered_query=F,
       sources=GSEA_SOURCES)

# Change query name
gost_mRNA$result$query <- "DE genes vs sequenced genes"

#Plot and table of significant results
gostplot(gost_mRNA ,capped = F, interactive = T)

HPV effect on cancer In the next chunk explores the effect of HPV on cancer. To be more precise, it explores which genes are DE in HPV in cancer. Firstly compare HPV - and controls to get a list of genes that re DE in cancer samples. Secondly, compare HPV + and HPV - samples and exclude genes from the previous comparison.

# making another dds object with different design
mRNA_dds_HPV <- mRNA_dds

# New column in colData for design
colData(mRNA_dds_HPV)$HistHPV <-
  as.factor(paste0(colData(mRNA_dds_HPV)$Hist,colData(mRNA_dds_HPV)$HPV))

# setting the HistHPV des
design(mRNA_dds_HPV) =~ HistHPV

# Running the DESeq function
mRNA_dds_HPV <- DESeq(mRNA_dds_HPV, betaPrior = T)

# Contrasting between negative HPV and controls
res.cN.nN <- 
  results(mRNA_dds_HPV, contrast=c("HistHPV","cancerN","normalN"), tidy = T)
# Contrasting between HPV negative and positive
res.cP.cN <- 
  results(mRNA_dds_HPV, contrast=c("HistHPV","cancerP","cancerN"), tidy = T)

# selecting significant results
res.cP.cN <- dplyr::filter(res.cP.cN, padj < 0.1)
res.cN.nN <- dplyr::filter(res.cN.nN, padj < 0.1)

#  
mRNA_HPV <- res.cP.cN[!is.element(res.cP.cN$row, res.cN.nN$row),]
mRNA_HPV$symbol <- idMap(mRNA_HPV$row)
## [1] "IDTABLE reading from filename: /home/katarina/Projects/HNSCC/scripts/IDTABLE_hg19.rds"
head(mRNA_HPV)
##                row   baseMean log2FoldChange     lfcSE     stat       pvalue
## 1  ENSG00000006047  98.943311       2.732856 0.7401323 3.692389 2.221573e-04
## 2  ENSG00000007038 297.657431       3.560497 0.8611900 4.134392 3.558962e-05
## 3  ENSG00000008196   5.563191       3.648879 0.9743384 3.744981 1.804076e-04
## 7  ENSG00000077935  52.302104       3.863428 0.8109130 4.764294 1.895156e-06
## 10 ENSG00000100285 305.883103       5.204755 0.7290451 7.139140 9.391694e-13
## 11 ENSG00000102387  22.342732       4.045055 0.8866892 4.561976 5.067450e-06
##            padj symbol
## 1  7.274828e-02   YBX2
## 2  2.387378e-02 PRSS21
## 3  6.137155e-02 TFAP2B
## 7  3.259949e-03  SMC1B
## 10 1.660733e-08   NEFH
## 11 7.467310e-03  TAF7L

GSEA on HPV effect
Gene set analysis on HPV related DE mRNA.

gost_HPV <- 
  gost(mRNA_HPV$row,
       organism= "hsapiens",
       exclude_iea=TRUE,
       ordered_query=F,
       significant=F,
       sources=GSEA_SOURCES)

# Change query name
gost_HPV$result$query <- "DE HPV related genes vs genome"

#Plot and table of significant results
gostplot(gost_HPV ,capped = F, interactive = T)

Volcano plot

volcano <- mRNA_res
LFC.TRESH <- 1
PVAL.TRESH <- 0.05

# Building volcano plot
volcano <- as.data.frame(mRNA_res) %>% 
  dplyr::filter(!is.na(padj)) %>% #filtering NA's
  dplyr::mutate(diffexpressed = dplyr::case_when(
    padj < PVAL.TRESH & log2FoldChange > LFC.TRESH ~ "UP",
    padj < PVAL.TRESH & log2FoldChange < -LFC.TRESH ~ "DOWN",
    TRUE ~ "NO")) 

# Getting hgnc symbols for plotting
volcano$hgnc_symbol <- idMap(rownames(volcano))
## [1] "IDTABLE reading from filename: /home/katarina/Projects/HNSCC/scripts/IDTABLE_hg19.rds"
## [1] "output length is different than input, check df manualy"
# Volcano colors
volcano_cols <- c("red", "gray60", "green")

# p value tresh for delabeling
p_volcano_tresh <- arrange(volcano,padj)[12, "padj"]

volcano <- volcano %>%
  mutate(delabel = ifelse(padj <= p_volcano_tresh,
                          hgnc_symbol,
                          NA))
  

ggplot(data=as.data.frame(volcano), 
       aes(x=log2FoldChange, 
           y=-log10(padj), 
           col=diffexpressed))+
  geom_point()+
  theme_minimal()+
  scale_color_manual(values = volcano_cols)

ggplot(data=as.data.frame(volcano), 
       aes(x=log2FoldChange, 
           y=-log10(padj), 
           col=diffexpressed))+
  geom_point()+
  theme_minimal()+
  scale_color_manual(values = volcano_cols)+
  geom_text_repel(aes(label = delabel), show.legend = FALSE,
                  size =3, max.overlaps = 50)
## Warning: Removed 16560 rows containing missing values (geom_text_repel).

Counts plot
Visualizing counts for genes

# Using the list of genes that are labeled in volcano plot
genes <- rownames(filter(volcano, !is.na(delabel)))

# Building an object for multiple counts plot
op <- par(mfrow = c(3,4),
          oma = c(2,2,0,0) + 1,
          mar = c(0,0,1,1) + 1.2)
# Plotting
for(gene in genes){
  plotCounts(mRNA_dds, gene = gene, intgroup = c("Hist"),
             returnData = F, pch = 19, main=idMap(gene))
}
## [1] "IDTABLE reading from filename: /home/katarina/Projects/HNSCC/scripts/IDTABLE_hg19.rds"
## [1] "IDTABLE reading from filename: /home/katarina/Projects/HNSCC/scripts/IDTABLE_hg19.rds"
## [1] "IDTABLE reading from filename: /home/katarina/Projects/HNSCC/scripts/IDTABLE_hg19.rds"
## [1] "IDTABLE reading from filename: /home/katarina/Projects/HNSCC/scripts/IDTABLE_hg19.rds"
## [1] "IDTABLE reading from filename: /home/katarina/Projects/HNSCC/scripts/IDTABLE_hg19.rds"
## [1] "IDTABLE reading from filename: /home/katarina/Projects/HNSCC/scripts/IDTABLE_hg19.rds"
## [1] "IDTABLE reading from filename: /home/katarina/Projects/HNSCC/scripts/IDTABLE_hg19.rds"
## [1] "IDTABLE reading from filename: /home/katarina/Projects/HNSCC/scripts/IDTABLE_hg19.rds"
## [1] "IDTABLE reading from filename: /home/katarina/Projects/HNSCC/scripts/IDTABLE_hg19.rds"
## [1] "IDTABLE reading from filename: /home/katarina/Projects/HNSCC/scripts/IDTABLE_hg19.rds"
## [1] "IDTABLE reading from filename: /home/katarina/Projects/HNSCC/scripts/IDTABLE_hg19.rds"

## [1] "IDTABLE reading from filename: /home/katarina/Projects/HNSCC/scripts/IDTABLE_hg19.rds"
par(op)

miR

Raw data

Raw sequencing data was previously analyzed and published. Since it was approved this notebook will start with a count matrix that was mapped with smallRNA Basepace application. It was also provided by Sabol and is located in the cloud linked in the mRNA Raw data section.

Loading data

The matrix contains only microRNA counts. For input the counts table was modified. The modification included deleting the second row which contains the information about the sample group and the column names were renamed to an unique id (eg. from Grce_17trimmed to RNA3T). Also, a 0 was added for the first column to denote an column for rownames. The modified file is miR_counts_matrix.txt

# count matrix
mir_countdata <- read.csv("inputs/miR_counts_matrix.txt", row.names=1)

# coldata
mir_coldata <- read.csv("inputs/miR_sample_table.csv",
                    row.names = 1,
                    sep = ";",
                    stringsAsFactors = FALSE)

#coldata$HPV <- as.factor(coldata$HPV)

#setting an names variable that is used in other DESeq2 functions
mir_coldata$names <- mir_coldata$Sample.ID

Excluding samples
Here we exclude samples with sampleID KBD9 and KBD25 since they are outliers as per paper Božinović et al.

keep <- mir_coldata$names != "KBD9" & mir_coldata$names != "KBD25"
mir_coldata <- mir_coldata [keep,]

keep <- colnames(mir_countdata) != "KBD9" & colnames(mir_countdata) != "KBD25"
mir_countdata <- mir_countdata [,keep]

Creating a dds object The aim of this experiment is to explore differentially expressed microRNA genes. From this data we can explore difference between cancer and normal, HPV pos and neg samples and different tissue samples. This notebook demonstrates differential expression analysis on the design Tissue + HPV.

mir_dds <- DESeqDataSetFromMatrix(countData = mir_countdata, 
                                       colData = mir_coldata, 
                                       design = ~Hist + Tissue + HPV)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
# filtering low count rows
dim(mir_dds)
## [1] 5789   20
keep <- rowSums(counts(mir_dds)) > 10
mir_dds <- mir_dds[keep,]
dim(mir_dds)
## [1] 3706   20

Exploratory analysis

Variance transformation is a statistical method for multidimensional data so it becomes homoskedastic. This data can be analyzed through clustering and principal component analysis.
rlog transformation Regularized logartihm (rlog) transformation behaves similarly to the log2 transformation. It is slower than the VST but recomended for small datasets (n<30).

mir_rld <- rlog(mir_dds)

Sample distance To asses overall similarity between samples we calculate sample distance using the R function dist to calculate the Euclidean distance.

Triangle sample distance heatmap

mir_annot_row <- as.data.frame(colData(mir_rld)[,c("Hist","Tissue", "HPV")])

triangleHeatMap(assay(mir_rld),
                color = "Reds",
                annot_row = mir_annot_row,
                triange.diagonal = F)

# selecting data for plotting pca
mRNA_pcadata <- 
  DESeq2::plotPCA(mir_rld, 
                  intgroup = c("Hist", "HPV", "Tissue"),
                  returnData = TRUE)
# selecting variance
percentVar <- round(100 * attr(mRNA_pcadata, "percentVar"))

# plotting pca data
ggplot(mRNA_pcadata, aes(x = PC1, y = PC2, color = HPV, shape = Hist)) +
  geom_point(size =3) +
  xlab(paste0("PC1: ", percentVar[1], "% variance")) +
  ylab(paste0("PC2: ", percentVar[2], "% variance")) +
   theme(panel.grid = element_blank(), 
         panel.background = element_rect(fill = "white"), 
  panel.border = element_rect(fill = "transparent"))+
  coord_fixed() +
  ggtitle("PCA with rlog data")+
  scale_color_manual(values = c("firebrick1","forestgreen","blue1"))+
  geom_text_repel(aes(label = colData(mir_rld)$Sample.ID), size =3)

Differential expression analysis

The call to DESeq function preforms the estimation of size factors, estimation of dispersion values and fits a generalized linear model.

mir_dds <- DESeq(mir_dds, betaPrior = T)

Differential expression on cancer control group

mir_res <- results(mir_dds, contrast = c("Hist","cancer","normal"))

mir_res
## log2 fold change (MAP): Hist cancer vs normal 
## Wald test p-value: Hist cancer vs normal 
## DataFrame with 3706 rows and 6 columns
##                                          baseMean log2FoldChange     lfcSE
##                                         <numeric>      <numeric> <numeric>
## hsa-let-7a-5p                         4.01521e+04       0.788898  0.315919
## hsa-let-7a-3p                         3.51211e+01       0.727370  0.488667
## hsa-let-7a-2-3p                       3.72577e-01      -0.938634  1.299768
## hsa-let-7b-5p                         4.77149e+03       1.072733  0.436635
## hsa-let-7c-5p                         2.61070e+03      -0.577264  0.691565
## ...                                           ...            ...       ...
## 9_TCTGATCAGGCAAAATTGCAGA_hsa-mir-4772    1.531377      1.3809634   1.14758
## 9_TGCAGGAACTTGTGAGTCTC_hsa-mir-873       1.348568      0.5957587   1.31484
## 9_TGCAGGAACTTGTGAGTCTCC_hsa-mir-873      2.164927      1.4813945   1.25393
## 9_TGCAGGAACTTGTGAGTCTCCT_hsa-mir-873     2.815976      0.8249337   1.25966
## 9_TGCCCCAACAAGGAAGGACA_hsa-mir-5699      0.973761     -0.0321778   1.08451
##                                             stat    pvalue      padj
##                                        <numeric> <numeric> <numeric>
## hsa-let-7a-5p                           2.497153 0.0125195 0.0509186
## hsa-let-7a-3p                           1.488479 0.1366247 0.2753381
## hsa-let-7a-2-3p                        -0.722155 0.4701992        NA
## hsa-let-7b-5p                           2.456819 0.0140173 0.0553042
## hsa-let-7c-5p                          -0.834722 0.4038745 0.5770754
## ...                                          ...       ...       ...
## 9_TCTGATCAGGCAAAATTGCAGA_hsa-mir-4772  1.2033737  0.228832        NA
## 9_TGCAGGAACTTGTGAGTCTC_hsa-mir-873     0.4531023  0.650475        NA
## 9_TGCAGGAACTTGTGAGTCTCC_hsa-mir-873    1.1814031  0.237443        NA
## 9_TGCAGGAACTTGTGAGTCTCCT_hsa-mir-873   0.6548868  0.512541  0.670913
## 9_TGCCCCAACAAGGAAGGACA_hsa-mir-5699   -0.0296704  0.976330        NA

Some summary statistic

summary(mir_res, alpha=0.05)
## 
## out of 3706 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 290, 7.8%
## LFC < 0 (down)     : 258, 7%
## outliers [1]       : 43, 1.2%
## low counts [2]     : 1422, 38%
## (mean count < 3)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results

Filtering BH adjusted pvalue results

mir_DE <- filter(as.data.frame(mir_res), padj < 0.05)

summary(mir_DE)
##     baseMean        log2FoldChange        lfcSE             stat        
##  Min.   :    2.66   Min.   :-4.8585   Min.   :0.2554   Min.   :-6.1428  
##  1st Qu.:   17.43   1st Qu.:-1.9653   1st Qu.:0.5223   1st Qu.:-3.2635  
##  Median :   60.70   Median : 1.4146   Median :0.6422   Median : 2.5797  
##  Mean   :  962.57   Mean   : 0.3375   Mean   :0.6735   Mean   : 0.3407  
##  3rd Qu.:  300.83   3rd Qu.: 2.5776   3rd Qu.:0.7872   3rd Qu.: 3.4037  
##  Max.   :95650.32   Max.   : 5.3210   Max.   :1.3206   Max.   : 8.5263  
##      pvalue               padj         
##  Min.   :0.0000000   Min.   :0.000000  
##  1st Qu.:0.0001065   1st Qu.:0.001731  
##  Median :0.0008250   Median :0.006730  
##  Mean   :0.0024880   Mean   :0.012903  
##  3rd Qu.:0.0037716   3rd Qu.:0.020552  
##  Max.   :0.0121402   Max.   :0.049646
hist(mir_DE$log2FoldChange, border="white", col="darkgreen")

hist(log(mir_DE$baseMean), border="white", col="darkblue")

HPV effect on cancer
In the next chunk explores the effect of HPV on cancer. To be more precise, it explores which genes are DE in HPV in cancer. Firstly compare HPV - and controls to get a list of genes that re DE in cancer samples. Secondly, compare HPV + and HPV - samples and exclude genes from the previous comparison.

# making another dds object with different design
mir_dds_HPV <- mir_dds

# New column in colData for design
colData(mir_dds_HPV)$HistHPV <-
  as.factor(paste0(colData(mir_dds_HPV)$Hist,colData(mir_dds_HPV)$HPV))

# setting the HistHPV des
design(mir_dds_HPV) =~ HistHPV

# Running the DESeq function
mir_dds_HPV <- DESeq(mir_dds_HPV, betaPrior = T)

# Contrasting between negative HPV and controls
res.cN.nN <- 
  results(mir_dds_HPV, contrast=c("HistHPV","cancerN","normalN"), tidy = T)
# Contrasting between HPV negative and positive
res.cP.cN <- 
  results(mir_dds_HPV, contrast=c("HistHPV","cancerP","cancerN"), tidy = T)

# selecting significant results
res.cP.cN <- dplyr::filter(res.cP.cN, padj < 0.1)
res.cN.nN <- dplyr::filter(res.cN.nN, padj < 0.1)

# excluding the cancer infulence
mir_HPV <- res.cP.cN[!is.element(res.cP.cN$row, res.cN.nN$row),]

mir_HPV
##                                       row   baseMean log2FoldChange     lfcSE
## 2                         hsa-miR-106a-5p  19.477681       2.101696 0.5436871
## 3                          hsa-miR-223-3p 185.415712      -1.197091 0.3587898
## 4                            hsa-miR-9-5p 412.771663       3.044664 0.6038572
## 5                            hsa-miR-9-3p   4.586593       2.336957 0.6819823
## 7                          hsa-miR-20b-5p  32.034451       2.376377 0.6252002
## 8                          hsa-miR-512-3p   1.512275      -2.561579 0.7777937
## 9    15_TCTTTGGTTATCTAGCTGTAT_hsa-mir-9-1   7.596875       4.434122 0.7640074
## 10   15_TCTTTGGTTATCTAGCTGTAT_hsa-mir-9-2   7.724201       3.370782 0.6892955
## 11   15_TCTTTGGTTATCTAGCTGTAT_hsa-mir-9-3   7.579966       2.942359 0.6781000
## 12  15_TCTTTGGTTATCTAGCTGTATG_hsa-mir-9-1  13.985244       2.849790 0.6413319
## 13  15_TCTTTGGTTATCTAGCTGTATG_hsa-mir-9-2  14.678740       3.270937 0.6932560
## 14  15_TCTTTGGTTATCTAGCTGTATG_hsa-mir-9-3  14.331517       3.216853 0.6763908
## 15   15_TTATGGTTTGCCTGGGACTGA_hsa-mir-584  33.754315      -1.927966 0.4920134
## 21 49_AATTGCACGGTATCCATCTGTAT_hsa-mir-363  37.467233       2.478019 0.5739185
## 22   5_CAAAGTGCTCATAGTGCAGGTA_hsa-mir-20b   3.705624       2.248063 0.6636685
## 23 5_CAAAGTGCTCATAGTGCAGGTAGA_hsa-mir-20b   4.186673       2.487489 0.7442990
## 24 5_CAAAGTGCTCATAGTGCAGGTAGT_hsa-mir-20b  11.510466       2.657850 0.7243385
## 25    50_ATTGCACGGTATCCATCTGT_hsa-mir-363  18.582416       1.925582 0.5581465
## 26  50_ATTGCACGGTATCCATCTGTAA_hsa-mir-363  10.831970       2.460293 0.5737673
## 27  50_ATTGCACGGTATCCATCTGTAT_hsa-mir-363  10.806923       2.391778 0.6827111
##         stat       pvalue         padj
## 2   3.865635 1.108006e-04 2.476393e-02
## 3  -3.336469 8.485003e-04 9.155025e-02
## 4   5.042026 4.606279e-07 7.206524e-04
## 5   3.426712 6.109366e-04 8.007564e-02
## 7   3.800985 1.441221e-04 3.006386e-02
## 8  -3.293391 9.898680e-04 9.991281e-02
## 9   5.803769 6.484062e-09 2.028863e-05
## 10  4.890184 1.007420e-06 1.050739e-03
## 11  4.339123 1.430527e-05 4.932882e-03
## 12  4.443550 8.848675e-06 4.614584e-03
## 13  4.718224 2.379125e-06 1.488857e-03
## 14  4.755909 1.975550e-06 1.488857e-03
## 15 -3.918523 8.909336e-05 2.144409e-02
## 21  4.317718 1.576504e-05 4.932882e-03
## 22  3.387327 7.057715e-04 8.493688e-02
## 23  3.342056 8.316018e-04 9.155025e-02
## 24  3.669348 2.431697e-04 4.670797e-02
## 25  3.449959 5.606721e-04 8.007564e-02
## 26  4.287963 1.803194e-05 5.129266e-03
## 27  3.503354 4.594393e-04 7.566240e-02

Volcano plot

volcano <- mir_res
LFC.TRESH <- 1
PVAL.TRESH <- 0.05

# Building volcano plot
volcano <- as.data.frame(mir_res) %>% 
  dplyr::filter(!is.na(padj)) %>% #filtering NA's
  dplyr::mutate(diffexpressed = dplyr::case_when(
    padj < PVAL.TRESH & log2FoldChange > LFC.TRESH ~ "UP",
    padj < PVAL.TRESH & log2FoldChange < -LFC.TRESH ~ "DOWN",
    TRUE ~ "NO")) 

# Volcano colors
volcano_cols <- c("red", "gray60", "green")

# p value tresh for delabeling
p_volcano_tresh <- arrange(volcano,padj)[12, "padj"]
volcano <- volcano %>%
  mutate(delabel = ifelse(padj <= p_volcano_tresh,
                          rownames(volcano),
                          NA))
  

ggplot(data=as.data.frame(volcano), 
       aes(x=log2FoldChange, 
           y=-log10(padj), 
           col=diffexpressed))+
  geom_point()+
  theme_minimal()+
  scale_color_manual(values = volcano_cols)

ggplot(data=as.data.frame(volcano), 
       aes(x=log2FoldChange, 
           y=-log10(padj), 
           col=diffexpressed))+
  geom_point()+
  theme_minimal()+
  scale_color_manual(values = volcano_cols)+
  geom_text_repel(aes(label = delabel), show.legend = FALSE,
                  size =3, max.overlaps = 50)
## Warning: Removed 2229 rows containing missing values (geom_text_repel).

Counts plot
Visualizing counts for genes

# Using the list of genes that are labeled in volcano plot
genes <- rownames(filter(volcano, !is.na(delabel)))

# Building an object for multiple counts plot
op <- par(mfrow = c(3,4),
          oma = c(2,2,0,0) + 1,
          mar = c(0,0,1,1) + 1.2)
# Plotting
for(gene in genes){
  plotCounts(mir_dds, gene = gene, intgroup = c("Hist"),
             returnData = F, pch = 19)
}

par(op)

DNAm

Raw data

Raw DNAm data was previously published in Gasperov paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7554960/ Raw data was preprocessed with Champ tool - normalization and batch effect removal. For simplicity preprocesed Rdata was loaded in this notebook. Data was provided from Sabol.

Loading data

For this analysis only cancer samples that have available microRNA and mRNA data will be considered. Since there is no methylation data for control samples of which there are microRNA and mRNA data, other control samples will be used for all analysis.

Methylation data is saved in an object from the RnBeads class.

load("inputs/rnb.set_Combat.RData")

# Renaming the rnb set
rnb_set <- rnb.set_Combat
rm(rnb.set_Combat)

This methylation dataset contains data about lesions which was not considered in this research. Lesion samples are removed here. Furthermore, additional phenotype data is added here. The xlsx table was downloaded from cloud and is called additional_columns_impute.xlsx and renamed to DNAm_additional_pheno.xlsx

# Removing lesions that are not included in the analysis
rnb_set <- 
  remove.samples(rnb_set, 
                 samplelist = (pheno(rnb_set)$Sample_group == "Lesion"))

dnam_pheno <-
  read_xlsx("inputs/DNAm_additional_pheno.xlsx", range = cell_cols("A:M"))

# Filtering out Lesions and renaming HPV values to P and N
dnam_pheno <-
  dnam_pheno %>%
  filter(Sample_group != "Lesion") %>%
  mutate(HPV = ifelse(HPV == 1 , "P", "N")) 

# Adding HPV and other phenotype data
rnb_set <- addPheno(rnb_set, dnam_pheno$HPV, "HPV")
rnb_set <- addPheno(rnb_set, dnam_pheno$Gender, "Gender")
rnb_set <- addPheno(rnb_set, dnam_pheno$AgeGroup, "Age_group")

Exploratory analysis

For DNAm data there are multiple regions that could be taken into consideration. Some of the standards are CpG sites that evaluate individual CpG’s, promoters (-2000, 500 from TSS), gene (gene body) and CpG islands (CGI are ranges with higher frequency of CpG’s that are previously annotated and are known).

PCA plot PCA plots could be generated over multiple regions. Individual CpG is computationaly expensive to calculate so only PCA on CGI is preformed

# Preforming dimension reduction on CpG islands
dred_islands <- rnb.execute.dreduction(rnb_set, target ="cpgislands")

# Building dnam_pca_data table for plotting
dnam_pca_data <- as.data.frame(dred_islands$pca$x[,c("PC1","PC2")])

# Additional annotation data
dnam_pca_data <- 
  cbind(dnam_pca_data,
        dplyr::select(as.data.frame(pheno(rnb_set)), 
                      Sample_group, HPV, Gender, Age_group))


ggplot(dnam_pca_data,aes(x=PC1,y=PC2, color=Sample_group, shape = HPV))+
  geom_point(data = dnam_pca_data, size=4)+
  ggtitle("PCA plot - CpG islands")+
  theme(panel.grid = element_blank(), 
        panel.background = element_rect(fill = "white"),
        panel.border = element_rect(fill = "transparent"))+
  geom_text_repel(aes(label = rownames(dnam_pca_data)), size =3)

Sample distance
To asses overall similarity between samples we calculate sample distance using the R function dist to calculate the Euclidean distance.

Triangle sample distance heatmap

# dataframe with annotation rows for triangle heatmap
dnam_annot_row <- dplyr::select(as.data.frame(pheno(rnb_set)), 
                                Sample_group, HPV, Gender, Age_group)

# Annotation dfs need to have samples as rownames
rownames(dnam_annot_row) <- pheno(rnb_set)$Sample_ID

# Call the local function
triangleHeatMap(meth(rnb_set),
                color = "Greens",
                annot_row = dnam_annot_row)

Differential methylation

Differential methylation can be calculated for specific sites or genomic regions. Since promoter and gene (gene body) regions are arbitrary picked ranges, CpG islands regions are used for differential methylation analysis. To see meaning of each column look at RnBeads vignette. For most of the analysis mean.mean.diff was taken into consideration because the value is similar as fold change in sequencing data.

diff_meth <- rnb.execute.computeDiffMeth(rnb_set,
                                         pheno.cols = "Sample_group",
                                         region.types = "cpgislands")
## 2021-11-03 11:40:51     3.3  STATUS STARTED Retrieving comparison info
## 2021-11-03 11:40:51     3.3  STATUS COMPLETED Retrieving comparison info
## 
## 2021-11-03 11:40:51     3.3  STATUS STARTED Computing differential methylation tables
## 2021-11-03 11:40:51     3.3  STATUS     STARTED Comparing: Cancer vs. Normal (based on Sample_group)
## 2021-11-03 11:40:51     3.3  STATUS         STARTED Computing Differential Methylation Table
## 2021-11-03 11:40:52     3.2    INFO             Conducting differential analysis using limma
## 2021-11-03 11:40:56     3.9  STATUS         COMPLETED Computing Differential Methylation Table
## 2021-11-03 11:40:56     4.0  STATUS         STARTED Computing Differential Methylation Tables (Region Level)
## 2021-11-03 11:41:10     3.2  STATUS             Computed table for cpgislands
## 2021-11-03 11:41:10     3.2  STATUS         COMPLETED Computing Differential Methylation Tables (Region Level)
## 2021-11-03 11:41:10     3.2  STATUS     COMPLETED Comparing: Cancer vs. Normal (based on Sample_group)
## 2021-11-03 11:41:10     3.2  STATUS COMPLETED Computing differential methylation tables
# Diff methylation table for CpG islands
dnam_res <- get.table(diff_meth,
                      get.comparisons(diff_meth)[1],
                      "cpgislands",
                      return.data.frame =T)

# Diff methylation table for CpG sites, used for volcano plot
dnam_res_sites <- get.table(diff_meth,
                      get.comparisons(diff_meth)[1],
                      "sites",
                      return.data.frame =T)


head(dnam_res)
##   mean.mean.g1 mean.mean.g2 mean.mean.diff mean.mean.quot.log2  comb.p.val
## 1  0.023044587  0.023533981  -0.0004893938          -0.1369526 0.099288297
## 2  0.033497661  0.024413333   0.0090843283           0.3374428 0.069531360
## 3  0.295863132  0.269334721   0.0265284110           0.1750961 0.123067217
## 4  0.014025619  0.024321920  -0.0102963011          -0.7440855 0.057410806
## 5  0.007239818  0.008961348  -0.0017215307          -0.1373170 0.402870410
## 6  0.329051525  0.272994051   0.0560574747           0.3087724 0.004687685
##   comb.p.adj.fdr num.sites mean.num.na.g1 mean.num.na.g2 combinedRank
## 1     0.13005057         2              0              0        23798
## 2     0.10156993         2              0              0        16924
## 3     0.15221326         3              0              0        19989
## 4     0.08993014         3              0              0        15783
## 5     0.42145158         1              0              0        23633
## 6     0.01697826         6              0              0        12393

Summary statistic of significant cpg island differential methylation.

summary(filter(dnam_res, comb.p.adj.fdr < 0.05))
##   mean.mean.g1        mean.mean.g2       mean.mean.diff     
##  Min.   :0.0000295   Min.   :0.0004438   Min.   :-0.769649  
##  1st Qu.:0.0571093   1st Qu.:0.0357609   1st Qu.:-0.079607  
##  Median :0.1969202   Median :0.0912668   Median : 0.001330  
##  Mean   :0.3064860   Mean   :0.3116314   Mean   :-0.005145  
##  3rd Qu.:0.5428289   3rd Qu.:0.6654861   3rd Qu.: 0.072257  
##  Max.   :0.9798466   Max.   :0.9968258   Max.   : 0.637812  
##  mean.mean.quot.log2   comb.p.val       comb.p.adj.fdr       num.sites     
##  Min.   :-5.06726    Min.   :0.000000   Min.   :0.000000   Min.   : 1.000  
##  1st Qu.:-0.44404    1st Qu.:0.000153   1st Qu.:0.001384   1st Qu.: 3.000  
##  Median :-0.08196    Median :0.001966   Median :0.008894   Median : 4.000  
##  Mean   : 0.14058    Mean   :0.005250   Mean   :0.015202   Mean   : 5.189  
##  3rd Qu.: 0.70880    3rd Qu.:0.008985   3rd Qu.:0.027093   3rd Qu.: 6.000  
##  Max.   : 4.53599    Max.   :0.022096   Max.   :0.049970   Max.   :74.000  
##  mean.num.na.g1 mean.num.na.g2  combinedRank  
##  Min.   :0      Min.   :0      Min.   :   18  
##  1st Qu.:0      1st Qu.:0      1st Qu.: 6123  
##  Median :0      Median :0      Median :10954  
##  Mean   :0      Mean   :0      Mean   :11566  
##  3rd Qu.:0      3rd Qu.:0      3rd Qu.:16850  
##  Max.   :0      Max.   :0      Max.   :24722

Global DNA hypomethylation was observed in most cancer tissues and hypermethylation of certain regions. Since hypomethylation happens at the level of the genome it is hard to pinpoint if there are direct oncogenic effect or is it a secondary result of malignancy. However , DNA hypermethylation on CpG rich locations named CpG islands which may affect gene transcirption or chromatin structure. Here we explore influecne of differential methylation of cpg islands in promoter regions.

Making GRanges for cpg island data then finding overlaps within hg19 promoter regions. Promoter regions are defined as -2000 and +500 bases from the transcription start site (TSS).

# Combining dnam res data with granges data
dnam_res_cgi <- 
  data.frame(dplyr::select(as.data.frame(annotation(rnb_set, "cpgislands")),
                    Chromosome,Start,End,Strand),
             dplyr::select(as.data.frame(dnam_res),
                    log.quot="mean.mean.quot.log2",
                    pvalue="comb.p.val",
                    padj="comb.p.adj.fdr",
                    num.sites))

# Making Granges object
dnam_res_cgi <-
  makeGRangesFromDataFrame(dnam_res_cgi,keep.extra.columns=TRUE)

# Getting promoter ranges with gene names
promoter_ranges <- unlist(rnb.get.annotation(type="promoters", assembly="hg19"),
                          use.names = F) %>%
  dplyr::select(symbol)

# Transfering ensembl genes to a column so that it does not lose in join overlaps
promoter_ranges$ensembl <- names(promoter_ranges)


# Overlaping cgi with promoter of genes
dnam_res_cgi <-
  join_overlap_left(dnam_res_cgi, promoter_ranges)

Volcano plot
Dodati neki tekst tu

LFC.TRESH <- 0
PVAL.TRESH <- 0.05
volcano_cols <- c("green","red",  "gray60")

# volcano data frame
volcano <- data.frame(values(dnam_res_cgi)) %>%
  dplyr::mutate(diffexpressed = dplyr::case_when(
    padj < PVAL.TRESH & log.quot > LFC.TRESH ~ "Hypermethylated",
    padj < PVAL.TRESH & log.quot < -LFC.TRESH ~ "Hypomethylated",
    TRUE ~ "Not significant"))

ggplot(data=as.data.frame(volcano), 
       aes(x=log.quot, 
           y=-log10(padj), 
           col=diffexpressed))+
  geom_point()+
  theme_minimal()+
  scale_color_manual(values = volcano_cols)

table(volcano$diffexpressed)
## 
## Hypermethylated  Hypomethylated Not significant 
##            5457            6557           16817

Since there are too many points on the volcano plot for differential methylation on cpg sites a histogram is used to visualize that most of cpg sites are hypomethylated. CpGs with padj < 0.05 are used for construction of histogram

# Volcano data frame
dnam_res_sites <- 
  dplyr::select(dnam_res_sites, 
         log.quot="mean.quot.log2",
         pvalue="diffmeth.p.val",
         padj="diffmeth.p.adj.fdr") 

hist(dplyr::filter(dnam_res_sites, padj < 0.05)$log.quot, 
     main="Histogram of CpG sites with padj < 0.05",
     xlab="log.quot",
     col="darkgreen",
     border="white")

GSEA analysis
Preforming gsea analysis on genes that have differentially methylated cpg islands in promoters. Number of genes are around 5700. KEGG pathways like Pathways in cancer, Breast cancer, Rap1 signaling pathway

# Using only part of sources
GSEA_SOURCES <- c("GO:BP","CORUM","KEGG","REAC","WP","MIRNA")

gost_DNAm <-
  gost(filter(dnam_res_cgi, padj < 0.05 & !is.na(ensembl))$ensembl,
       organism= "hsapiens",
       exclude_iea=TRUE,
       domain_scope="annotated",
       ordered_query=F,
       sources=GSEA_SOURCES)

# Change query name
gost_DNAm$result$query <- "CGI"

#Plot and table of significant results
gostplot(gost_DNAm ,capped = F, interactive = T)

Session Info

sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=hr_HR.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=hr_HR.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=hr_HR.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=hr_HR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] grid      stats4    parallel  stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] biomaRt_2.48.3                         
##  [2] dplyr_1.0.7                            
##  [3] plyranges_1.12.1                       
##  [4] readxl_1.3.1                           
##  [5] RnBeads.hg19_1.24.0                    
##  [6] RnBeads_2.10.0                         
##  [7] plyr_1.8.6                             
##  [8] methylumi_2.38.0                       
##  [9] minfi_1.38.0                           
## [10] bumphunter_1.34.0                      
## [11] locfit_1.5-9.4                         
## [12] iterators_1.0.13                       
## [13] foreach_1.5.1                          
## [14] Biostrings_2.60.2                      
## [15] XVector_0.32.0                         
## [16] FDb.InfiniumMethylation.hg19_2.2.0     
## [17] org.Hs.eg.db_3.13.0                    
## [18] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
## [19] reshape2_1.4.4                         
## [20] scales_1.1.1                           
## [21] illuminaio_0.34.0                      
## [22] limma_3.48.3                           
## [23] gridExtra_2.3                          
## [24] gplots_3.1.1                           
## [25] fields_12.5                            
## [26] viridis_0.6.2                          
## [27] viridisLite_0.4.0                      
## [28] spam_2.7-0                             
## [29] dotCall64_1.0-1                        
## [30] ff_4.0.4                               
## [31] bit_4.0.4                              
## [32] cluster_2.1.2                          
## [33] MASS_7.3-54                            
## [34] gprofiler2_0.2.1                       
## [35] ggrepel_0.9.1                          
## [36] ggplot2_3.3.5                          
## [37] RColorBrewer_1.1-2                     
## [38] pheatmap_1.0.12                        
## [39] DESeq2_1.32.0                          
## [40] SummarizedExperiment_1.22.0            
## [41] MatrixGenerics_1.4.3                   
## [42] matrixStats_0.61.0                     
## [43] tximport_1.20.0                        
## [44] stringr_1.4.0                          
## [45] GenomicFeatures_1.44.2                 
## [46] AnnotationDbi_1.54.1                   
## [47] Biobase_2.52.0                         
## [48] GenomicRanges_1.44.0                   
## [49] GenomeInfoDb_1.28.4                    
## [50] IRanges_2.26.0                         
## [51] S4Vectors_0.30.2                       
## [52] BiocGenerics_0.38.0                    
## [53] BiocManager_1.30.16                    
## [54] rmarkdown_2.11                         
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.2                tidyselect_1.1.1         
##   [3] RSQLite_2.2.8             htmlwidgets_1.5.4        
##   [5] BiocParallel_1.26.2       munsell_0.5.0            
##   [7] codetools_0.2-18          preprocessCore_1.54.0    
##   [9] withr_2.4.2               colorspace_2.0-2         
##  [11] filelock_1.0.2            highr_0.9                
##  [13] knitr_1.36                rstudioapi_0.13          
##  [15] labeling_0.4.2            GenomeInfoDbData_1.2.6   
##  [17] farver_2.1.0              bit64_4.0.5              
##  [19] rhdf5_2.36.0              vctrs_0.3.8              
##  [21] generics_0.1.1            xfun_0.27                
##  [23] BiocFileCache_2.0.0       R6_2.5.1                 
##  [25] bitops_1.0-7              rhdf5filters_1.4.0       
##  [27] cachem_1.0.6              reshape_0.8.8            
##  [29] DelayedArray_0.18.0       assertthat_0.2.1         
##  [31] vroom_1.5.5               BiocIO_1.2.0             
##  [33] gtable_0.3.0              rlang_0.4.12             
##  [35] genefilter_1.74.1         splines_4.1.1            
##  [37] rtracklayer_1.52.1        lazyeval_0.2.2           
##  [39] GEOquery_2.60.0           yaml_2.2.1               
##  [41] crosstalk_1.1.1           tools_4.1.1              
##  [43] nor1mix_1.3-0             ellipsis_0.3.2           
##  [45] jquerylib_0.1.4           siggenes_1.66.0          
##  [47] Rcpp_1.0.7                sparseMatrixStats_1.4.2  
##  [49] progress_1.2.2            zlibbioc_1.38.0          
##  [51] purrr_0.3.4               RCurl_1.98-1.5           
##  [53] prettyunits_1.1.1         openssl_1.4.5            
##  [55] magrittr_2.0.1            data.table_1.14.2        
##  [57] hms_1.1.1                 evaluate_0.14            
##  [59] xtable_1.8-4              XML_3.99-0.8             
##  [61] mclust_5.4.7              compiler_4.1.1           
##  [63] tibble_3.1.5              maps_3.4.0               
##  [65] KernSmooth_2.23-20        crayon_1.4.1             
##  [67] htmltools_0.5.2           tzdb_0.2.0               
##  [69] tidyr_1.1.4               geneplotter_1.70.0       
##  [71] DBI_1.1.1                 dbplyr_2.1.1             
##  [73] rappdirs_0.3.3            Matrix_1.3-4             
##  [75] readr_2.0.2               cli_3.1.0                
##  [77] quadprog_1.5-8            pkgconfig_2.0.3          
##  [79] GenomicAlignments_1.28.0  plotly_4.10.0            
##  [81] xml2_1.3.2                annotate_1.70.0          
##  [83] bslib_0.3.1               rngtools_1.5.2           
##  [85] multtest_2.48.0           beanplot_1.2             
##  [87] doRNG_1.8.2               scrime_1.3.5             
##  [89] digest_0.6.28             base64_2.0               
##  [91] cellranger_1.1.0          DelayedMatrixStats_1.14.3
##  [93] restfulr_0.0.13           curl_4.3.2               
##  [95] Rsamtools_2.8.0           gtools_3.9.2             
##  [97] rjson_0.2.20              lifecycle_1.0.1          
##  [99] nlme_3.1-153              jsonlite_1.7.2           
## [101] Rhdf5lib_1.14.2           askpass_1.1              
## [103] fansi_0.5.0               pillar_1.6.4             
## [105] lattice_0.20-45           KEGGREST_1.32.0          
## [107] fastmap_1.1.0             httr_1.4.2               
## [109] survival_3.2-13           glue_1.4.2               
## [111] png_0.1-7                 stringi_1.7.5            
## [113] sass_0.4.0                HDF5Array_1.20.0         
## [115] blob_1.2.2                caTools_1.18.2           
## [117] memoise_2.0.0